Purpose: Briefly outline the purpose of your experiment.
Hypothesis: Outline the hypotheses if applicable.
Predictions: - Prediction 1 (if applicable).
Data: Briefly describe the type of data collected.
To get digital distances: The TPS coordinates were in a long format, and I thought it would be easiest to locate the values needed to insert into the euclidean distance formula with a wide format. So, I first cleaned up the original TPS file–I removed all curve data, and unnecessary rows (ie. LM=16, image=) to be left with a column of IDs, a column of landmark labels, and two columns for the coordinates.
Importing that into R, I then used the wider fxn to create one column for IDs, 16 columns for the X coordinates of all the landmarks, and 16 columns for all the y coordinates of all the landmarks (originally wanted it to be a coordinates column, so there would be two rows for each specimen to minimize column numbers, but whatever, this works). I separated this into two dataframes, one with just the X coords and one with just the Y, that way it would be easier to call.
I calculated the euclidean distance (formula: sqrt(sq(X\(LM2-X\)LM1)+sq(Y\(LM2-Y\)LM1))) for most of the same measurements (only missing total length, and head width, as this was not possible in the digital photos). For head depth, wasn’t sure if 2-> 11 or 2->12 was closer to what I measured by hand; same for body depth, wasn’t sure if it was 3->10 or 3->11. I included both.
finished all of the calculations and added the values into a dataframe with the IDs from the original tps file. Exported it to excel
Once in excel, I realized that the values I had were in pixels (I think?) rather than mm, so I needed to find a way to convert in order to compare to the hand measurements. I found out that the scale factor (included in the original tps file) is mm/pixel, so I added this to the excel, and multiplied it to each value to get the mm. However, it was a bit off (eg 4.5 rather than 45) so I multiplied it by 10 to get to a number that resembled the hand measurements. Not 100% sure if this was correct, or why I would need to multiply by 10, but going with it for now.
library(tidyverse) library(curl)
raw <- curl(“https://raw.githubusercontent.com/allisondavis/morphology_analysis/refs/heads/master/dig-dist.csv”)
raw <- read.csv(raw, header = TRUE, sep = “,”, stringsAsFactors = TRUE)
raw1 <- raw %>% pivot_wider(names_from = LABELS, values_from = c(X.COR, Y.COR))
x <- raw1[ , 1:17] y <- raw1[, c(1, 18:33)]
HL.dig <- sqrt(((x\(X.COR_LM2 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM2-y\)Y.COR_LM1)^2)) SL.dig <- sqrt(((x\(X.COR_LM6 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM6-y\)Y.COR_LM1)^2)) PreDL.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM2)^2)) DbL.dig <- sqrt(((x\(X.COR_LM4 -x\)X.COR_LM3)^2 + (y\(Y.COR_LM4-y\)Y.COR_LM3)^2)) CPD.dig <- sqrt(((x\(X.COR_LM7 -x\)X.COR_LM5)^2 + (y\(Y.COR_LM7-y\)Y.COR_LM5)^2)) HD1.dig <- sqrt(((x\(X.COR_LM11 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM11-y\)Y.COR_LM2)^2)) HD2.dig <- sqrt(((x\(X.COR_LM12 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM12-y\)Y.COR_LM2)^2)) CPL.dig <- sqrt(((x\(X.COR_LM8 -x\)X.COR_LM6)^2 + (y\(Y.COR_LM8-y\)Y.COR_LM6)^2)) BD1.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM10)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM10)^2)) BD2.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM11)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM11)^2)) SnL.dig <- sqrt(((x\(X.COR_LM15 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM15-y\)Y.COR_LM1)^2)) OL.dig <- sqrt(((x\(X.COR_LM16 -x\)X.COR_LM15)^2 + (y\(Y.COR_LM16-y\)Y.COR_LM15)^2))
raw2 <- cbind(raw1[,1], SL.dig, BD1.dig, BD2.dig, CPD.dig, CPL.dig, PreDL.dig, DbL.dig, HL.dig, HD1.dig, HD2.dig, SnL.dig, OL.dig)
library(writexl)
write_xlsx(raw2, ‘C:/Users/allis/OneDrive/Desktop/comparison_data.xlsx’)
Performed most analyses in MorphoJ. Analyses here will strictly be for comparing digital vs hand measurements.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(curl)
## Using libcurl 8.3.0 with Schannel
##
## Attaching package: 'curl'
##
## The following object is masked from 'package:readr':
##
## parse_date
raw <- curl("https://raw.githubusercontent.com/allisondavis/morphology_analysis/refs/heads/master/comparison_data.csv")
raw <- read.csv(raw, header = TRUE, sep = ",", stringsAsFactors = TRUE)
#AD20-084 is missing all manual values, so I will remove to avoid issues with NAs
raw<- raw[raw$ID !="AD20-082", ]
I will use Shapiro-wilk, histograms, and QQ plots to determine what traits are normal.
Will separate my raw dataset into the two factors of interest: digital and manual measurements.
library(dplyr)
digital <- raw %>%
select(ID, SPP, ends_with(".dig"))
manual <- raw %>%
select(ID, SPP, ends_with(".man"))
All but OL fail SW test with a slight skew/deviation to the right.
##### Shapiro-Wilk test #####
shapiro.test(digital$SL)
##
## Shapiro-Wilk normality test
##
## data: digital$SL
## W = 0.97473, p-value = 1.165e-05
shapiro.test(digital$BD1)
##
## Shapiro-Wilk normality test
##
## data: digital$BD1
## W = 0.98196, p-value = 0.0002967
shapiro.test(digital$BD2)
##
## Shapiro-Wilk normality test
##
## data: digital$BD2
## W = 0.9907, p-value = 0.03071
shapiro.test(digital$CPD)
##
## Shapiro-Wilk normality test
##
## data: digital$CPD
## W = 0.96472, p-value = 2.6e-07
shapiro.test(digital$CPL)
##
## Shapiro-Wilk normality test
##
## data: digital$CPL
## W = 0.95444, p-value = 9.409e-09
shapiro.test(digital$PreDL)
##
## Shapiro-Wilk normality test
##
## data: digital$PreDL
## W = 0.98017, p-value = 0.0001268
shapiro.test(digital$DbL)
##
## Shapiro-Wilk normality test
##
## data: digital$DbL
## W = 0.9864, p-value = 0.002823
shapiro.test(digital$HL)
##
## Shapiro-Wilk normality test
##
## data: digital$HL
## W = 0.95243, p-value = 5.184e-09
shapiro.test(digital$HD1)
##
## Shapiro-Wilk normality test
##
## data: digital$HD1
## W = 0.9765, p-value = 2.463e-05
shapiro.test(digital$HD2)
##
## Shapiro-Wilk normality test
##
## data: digital$HD2
## W = 0.97148, p-value = 3.143e-06
shapiro.test(digital$SnL)
##
## Shapiro-Wilk normality test
##
## data: digital$SnL
## W = 0.97219, p-value = 4.145e-06
shapiro.test(digital$OL)
##
## Shapiro-Wilk normality test
##
## data: digital$OL
## W = 0.99609, p-value = 0.5705
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(digital$SL, breaks=30)
hist(digital$BD1, breaks=30)
hist(digital$BD2, breaks=30)
hist(digital$CPD, breaks=30)
hist(digital$CPL, breaks=30)
hist(digital$PreDL, breaks=30)
hist(digital$DbL, breaks=30)
hist(digital$HL, breaks=30)
hist(digital$HD1, breaks=30)
hist(digital$HD2, breaks=30)
hist(digital$SnL, breaks=30)
hist(digital$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(digital$SL)
qqline(digital$SL)
qqnorm(digital$BD1)
qqline(digital$BD1)
qqnorm(digital$BD2)
qqline(digital$BD2)
qqnorm(digital$CPD)
qqline(digital$CPD)
qqnorm(digital$CPL)
qqline(digital$CPL)
qqnorm(digital$PreDL)
qqline(digital$PreDL)
qqnorm(digital$DbL)
qqline(digital$DbL)
qqnorm(digital$HL)
qqline(digital$HL)
qqnorm(digital$HD1)
qqline(digital$HD1)
qqnorm(digital$HD2)
qqline(digital$HD2)
qqnorm(digital$SnL)
qqline(digital$SnL)
qqnorm(digital$OL)
qqline(digital$OL)
All fail the SW test, with a slight skew/deviation to the right.
##### Shapiro-Wilk test #####
shapiro.test(manual$SL)
##
## Shapiro-Wilk normality test
##
## data: manual$SL
## W = 0.98036, p-value = 0.0001385
shapiro.test(manual$BD1)
##
## Shapiro-Wilk normality test
##
## data: manual$BD1
## W = 0.96494, p-value = 2.806e-07
shapiro.test(manual$CPD)
##
## Shapiro-Wilk normality test
##
## data: manual$CPD
## W = 0.97037, p-value = 2.046e-06
shapiro.test(manual$CPL)
##
## Shapiro-Wilk normality test
##
## data: manual$CPL
## W = 0.97427, p-value = 9.619e-06
shapiro.test(manual$PreDL)
##
## Shapiro-Wilk normality test
##
## data: manual$PreDL
## W = 0.98037, p-value = 0.0001391
shapiro.test(manual$DbL)
##
## Shapiro-Wilk normality test
##
## data: manual$DbL
## W = 0.98347, p-value = 0.00062
shapiro.test(manual$HL)
##
## Shapiro-Wilk normality test
##
## data: manual$HL
## W = 0.97032, p-value = 2.009e-06
shapiro.test(manual$HD1)
##
## Shapiro-Wilk normality test
##
## data: manual$HD1
## W = 0.98378, p-value = 0.0007249
shapiro.test(manual$SnL)
##
## Shapiro-Wilk normality test
##
## data: manual$SnL
## W = 0.97615, p-value = 2.113e-05
shapiro.test(manual$OL)
##
## Shapiro-Wilk normality test
##
## data: manual$OL
## W = 0.98828, p-value = 0.007818
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(manual$SL, breaks=30)
hist(manual$BD1, breaks=30)
hist(manual$CPD, breaks=30)
hist(manual$CPL, breaks=30)
hist(manual$PreDL, breaks=30)
hist(manual$DbL, breaks=30)
hist(manual$HL, breaks=30)
hist(manual$HD1, breaks=30)
hist(manual$SnL, breaks=30)
hist(manual$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(manual$SL)
qqline(manual$SL)
qqnorm(manual$BD1)
qqline(manual$BD1)
qqnorm(manual$CPD)
qqline(manual$CPD)
qqnorm(manual$CPL)
qqline(manual$CPL)
qqnorm(manual$PreDL)
qqline(manual$PreDL)
qqnorm(manual$DbL)
qqline(manual$DbL)
qqnorm(manual$HL)
qqline(manual$HL)
qqnorm(manual$HD1)
qqline(manual$HD1)
qqnorm(manual$SnL)
qqline(manual$SnL)
qqnorm(manual$OL)
qqline(manual$OL)
While some values are still significant, log transformations vastly improve normality (eg p=2e-16 to p=0.002). The histograms and QQ plots look pretty damn normal, so I will stick with log transformed values.
Log transformed data sets
dig_trans <- digital
dig_trans[, c(3:14)] <- log(dig_trans[, c(3:14)])
man_trans <- manual
man_trans[, c(3:12)] <- log(man_trans[, c(3:12)])
OL was the only one that was normal in raw check, but since it was not normal in manual, and I want to compare dig to man, I will transform the OL here too (don’t want to compare raw to transformed measurements).
##### Shapiro-Wilk test #####
shapiro.test(dig_trans$SL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$SL
## W = 0.99403, p-value = 0.2055
shapiro.test(dig_trans$BD1)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$BD1
## W = 0.9938, p-value = 0.1809
shapiro.test(dig_trans$BD2)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$BD2
## W = 0.98978, p-value = 0.0181
shapiro.test(dig_trans$CPD)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$CPD
## W = 0.99187, p-value = 0.06017
shapiro.test(dig_trans$CPL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$CPL
## W = 0.99116, p-value = 0.03989
shapiro.test(dig_trans$PreDL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$PreDL
## W = 0.99151, p-value = 0.04877
shapiro.test(dig_trans$DbL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$DbL
## W = 0.98425, p-value = 0.0009222
shapiro.test(dig_trans$HL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$HL
## W = 0.99129, p-value = 0.04319
shapiro.test(dig_trans$HD1)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$HD1
## W = 0.99687, p-value = 0.758
shapiro.test(dig_trans$HD2)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$HD2
## W = 0.99479, p-value = 0.3068
shapiro.test(dig_trans$SnL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$SnL
## W = 0.99732, p-value = 0.8576
shapiro.test(dig_trans$OL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$OL
## W = 0.98989, p-value = 0.01936
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(dig_trans$SL, breaks=30)
hist(dig_trans$BD1, breaks=30)
hist(dig_trans$BD2, breaks=30)
hist(dig_trans$CPD, breaks=30)
hist(dig_trans$CPL, breaks=30)
hist(dig_trans$PreDL, breaks=30)
hist(dig_trans$DbL, breaks=30)
hist(dig_trans$HL, breaks=30)
hist(dig_trans$HD1, breaks=30)
hist(dig_trans$HD2, breaks=30)
hist(dig_trans$SnL, breaks=30)
hist(dig_trans$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(dig_trans$SL)
qqline(dig_trans$SL)
qqnorm(dig_trans$BD1)
qqline(dig_trans$BD1)
qqnorm(dig_trans$BD2)
qqline(dig_trans$BD2)
qqnorm(dig_trans$CPD)
qqline(dig_trans$CPD)
qqnorm(dig_trans$CPL)
qqline(dig_trans$CPL)
qqnorm(dig_trans$PreDL)
qqline(dig_trans$PreDL)
qqnorm(dig_trans$DbL)
qqline(dig_trans$DbL)
qqnorm(dig_trans$HL)
qqline(dig_trans$HL)
qqnorm(dig_trans$HD1)
qqline(dig_trans$HD1)
qqnorm(dig_trans$HD2)
qqline(dig_trans$HD2)
qqnorm(dig_trans$SnL)
qqline(dig_trans$SnL)
qqnorm(dig_trans$OL)
qqline(dig_trans$OL)
##### Shapiro-Wilk test #####
shapiro.test(man_trans$SL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$SL
## W = 0.99424, p-value = 0.2297
shapiro.test(man_trans$BD1)
##
## Shapiro-Wilk normality test
##
## data: man_trans$BD1
## W = 0.99567, p-value = 0.4747
shapiro.test(man_trans$CPD)
##
## Shapiro-Wilk normality test
##
## data: man_trans$CPD
## W = 0.99598, p-value = 0.5428
shapiro.test(man_trans$CPL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$CPL
## W = 0.98966, p-value = 0.01693
shapiro.test(man_trans$PreDL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$PreDL
## W = 0.98692, p-value = 0.003722
shapiro.test(man_trans$DbL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$DbL
## W = 0.98849, p-value = 0.008812
shapiro.test(man_trans$HL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$HL
## W = 0.99081, p-value = 0.03262
shapiro.test(man_trans$HD1)
##
## Shapiro-Wilk normality test
##
## data: man_trans$HD1
## W = 0.99264, p-value = 0.09385
shapiro.test(man_trans$SnL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$SnL
## W = 0.99033, p-value = 0.02488
shapiro.test(man_trans$OL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$OL
## W = 0.97981, p-value = 0.0001072
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(man_trans$SL, breaks=30)
hist(man_trans$BD1, breaks=30)
hist(man_trans$CPD, breaks=30)
hist(man_trans$CPL, breaks=30)
hist(man_trans$PreDL, breaks=30)
hist(man_trans$DbL, breaks=30)
hist(man_trans$HL, breaks=30)
hist(man_trans$HD1, breaks=30)
hist(man_trans$SnL, breaks=30)
hist(man_trans$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(man_trans$SL)
qqline(man_trans$SL)
qqnorm(man_trans$BD1)
qqline(man_trans$BD1)
qqnorm(man_trans$CPD)
qqline(man_trans$CPD)
qqnorm(man_trans$CPL)
qqline(man_trans$CPL)
qqnorm(man_trans$PreDL)
qqline(man_trans$PreDL)
qqnorm(man_trans$DbL)
qqline(man_trans$DbL)
qqnorm(man_trans$HL)
qqline(man_trans$HL)
qqnorm(man_trans$HD1)
qqline(man_trans$HD1)
qqnorm(man_trans$SnL)
qqline(man_trans$SnL)
qqnorm(man_trans$OL)
qqline(man_trans$OL)
I will perform 3 different analyses: ANOVA, Bland-Altman and PCA. These vary in whether they need long or wide formats. Also, the raw data contains columns that are not needed for these analyses (eg location or photo date) so I will remove those to leave just the species, tag ID, and measurements.
Long format
library(tidyr)
#log transform entire data set
raw2 <- raw
raw2[, c(3:24)] <- log(raw2[, c(3:24)])
# this will create a data frame with species, ID, characteristic, method and value
df_long <- raw2 %>%
pivot_longer(
cols = ends_with(".dig") | ends_with(".man"),
names_to = c("characteristic", "method"),
names_sep = "\\.",
values_to = "value"
)
# method and characteristic are initially a character; need to be a factor
df_long$method <- as.factor(df_long$method)
df_long$characteristic <- as.factor(df_long$characteristic)
# one fish (AD20-082) is missing hand measurements; will remove
df_long <- df_long[df_long$ID !="AD20-082", ]
#create data frames that only have one measurement for BD and HD
## BD1 and HD1 for both digital and manual measurements
df_long2 <- df_long[-grep("BD2", df_long$characteristic),]
df_long2 <- df_long2[-grep("HD2", df_long2$characteristic),]#only contains BD1 & HD1
## BD1/HD1 for manual (since they only have one) and BD2/HD2 for digital
df_long3 <- df_long %>%
filter(
# Keep everything for Hand measurements
method == "man" |
# Keep all digital measurements except BD1 and HD1
!(method == "dig" & characteristic %in% c("BD1", "HD1"))
)
#to ensure comparisons between BD2 and BD1 in this new data frame, will combine characteristic names to just BD or HD
df_long3<- df_long3 %>%
mutate(
characteristic = case_when(
characteristic %in% c("BD1", "BD2") ~ "BD",
characteristic %in% c("HD1", "HD2") ~ "HD",
TRUE ~ characteristic # Leave other characteristics unchanged
)
)
Wide format
df_wide <- df_long2 %>%
pivot_wider(
names_from = method,
values_from = value
)
df_wide2 <- df_long3 %>%
pivot_wider(
names_from = method,
values_from = value
)
Results: no significant difference between hand or digital measurements when using BD2/HD2; suggestive difference in method when using BD1/HD1 (0.0584).
No effect of species (did’t expect there to be, but just in case).
# NOTE: analysis requires long format
#### for BD1 and HD1
anova_results <- aov(value~method * SPP, data = df_long2)
summary(anova_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## method 1 2 2.0122 3.584 0.0584 .
## SPP 2 2 0.9491 1.691 0.1845
## method:SPP 2 1 0.3224 0.574 0.5631
## Residuals 6774 3803 0.5614
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_results2 <- aov(value~method + SPP, data = df_long2)
summary(anova_results2)
## Df Sum Sq Mean Sq F value Pr(>F)
## method 1 2 2.0122 3.585 0.0584 .
## SPP 2 2 0.9491 1.691 0.1845
## Residuals 6776 3804 0.5613
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_results3 <- aov(value~method, data = df_long2)
summary(anova_results3)
## Df Sum Sq Mean Sq F value Pr(>F)
## method 1 2 2.0122 3.584 0.0584 .
## Residuals 6778 3805 0.5614
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### for BD2 and HD2 in digital, BD1 and HD1 in hand
anova_results4 <- aov(value~method * SPP, data = df_long3)
summary(anova_results4)
## Df Sum Sq Mean Sq F value Pr(>F)
## method 1 1 0.5344 0.931 0.335
## SPP 2 2 0.8128 1.416 0.243
## method:SPP 2 1 0.5398 0.941 0.390
## Residuals 6774 3888 0.5739
anova_results5 <- aov(value~method + SPP, data = df_long3)
summary(anova_results5)
## Df Sum Sq Mean Sq F value Pr(>F)
## method 1 1 0.5344 0.931 0.335
## SPP 2 2 0.8128 1.416 0.243
## Residuals 6776 3889 0.5739
anova_results6 <- aov(value~method, data = df_long3)
summary(anova_results6)
## Df Sum Sq Mean Sq F value Pr(>F)
## method 1 1 0.5344 0.931 0.335
## Residuals 6778 3890 0.5740
Used to assess agreement between two measurement methods.
This analysis will only use BD1 and HD1 for both digital and manual measurements.
# requires wide data frame
library(ggplot2)
#compute averages and differences for Bland-Altman
df_ba <- df_wide %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
#### Plot for whole dataset
ba_full <- ggplot(df_ba, aes(x = average, y = difference)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full
The Bland-Altman plot for the whole dataset shows disticnt clusters. This is most likely caused from the differences in range for the different characteristic types. We will look at species and characteristics to see which influences the clustering.
##### Species
ba_full_SPP <- ggplot(df_ba, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full_SPP
##### Characteristics
ba_full_CR <- ggplot(df_ba, aes(x = average, y = difference, color=characteristic)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full_CR
We indeed see that characteristics drive the clustering. We also see that the data for Pre-dorsal length (PreDL) hover at and below the lower limit of agreement. This would suggest either a measurement bias in one/both of the measurement methods, or a biological difference, where certain species, sizes, or specific characteristics might be harder to measure accurately with one method compared to the other.
We’ll take a look at Bland-Altman plots for each characteristic now.
ba_by_char <- ggplot(df_ba, aes(x = average, y = difference)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
facet_wrap(~ characteristic, scales = "free") +
theme_minimal()
ba_by_char
### Check if there are species differences
ba_CR_SPP <- ggplot(df_ba, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
facet_wrap(~ characteristic, scales = "free") +
theme_minimal()
ba_CR_SPP
As before, we see that PreDL is at or below the lower limit of agreement. When colored by species, we don’t see any distinct clustering causing this positioning. Likewise, there are no distinct species clustering in any of the characteristics, with only a handful of outliers in each characteristic. This may be due to some of the individual fish being MUCH larger than the others.
This analysis will use BD2 and HD2 for digital, but BD1 and HD1 for manual measurements.
# requires wide data frame
#compute averages and differences for Bland-Altman
df_ba2 <- df_wide2 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
#### Plot for whole dataset
ba_full2 <- ggplot(df_ba2, aes(x = average, y = difference)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full2
The Bland-Altman plot for the whole dataset shows disticnt clusters. This is most likely caused from the differences in range for the different characteristic types. We will look at species and characteristics to see which influences the clustering.
##### Species
ba_full_SPP2 <- ggplot(df_ba2, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full_SPP2
##### Characteristics
ba_full_CR2 <- ggplot(df_ba2, aes(x = average, y = difference, color=characteristic)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full_CR2
We indeed see that characteristics drive the clustering. We also see that the data for Pre-dorsal length (PreDL) hover at and below the lower limit of agreement. This would suggest either a measurement bias in one/both of the measurement methods, or a biological difference, where certain species, sizes, or specific characteristics might be harder to measure accurately with one method compared to the other.
We’ll take a look at Bland-Altman plots for each characteristic now.
ba_by_char2 <- ggplot(df_ba2, aes(x = average, y = difference)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
facet_wrap(~ characteristic, scales = "free") +
theme_minimal()
ba_by_char2
### Check if there are species differences
ba_CR_SPP2 <- ggplot(df_ba2, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
facet_wrap(~ characteristic, scales = "free") +
theme_minimal()
ba_CR_SPP2
We’re really only interested in how BD1, HD1, BD2, and HD2 compare, so let’s isolate those.
data_BD1 <- df_wide %>%
filter(characteristic == "BD1")
data_BD1 <- data_BD1 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
data_HD1 <- df_wide %>%
filter(characteristic == "HD1")
data_HD1 <- data_HD1 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
data_BD2 <- df_wide2 %>%
filter(characteristic == "BD")
data_BD2 <- data_BD2 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
data_HD2 <- df_wide2 %>%
filter(characteristic == "HD")
data_HD2 <- data_HD2 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
ba_BD1 <- ggplot(data_BD1, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_BD1$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_BD1$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_BD1$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "BD1",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_HD1 <- ggplot(data_HD1, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_HD1$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_HD1$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_HD1$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "HD1",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_BD2 <- ggplot(data_BD2, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_BD2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_BD2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_BD2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "BD2",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_HD2 <- ggplot(data_HD2, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_HD2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_HD2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_HD2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "HD2",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(ba_BD1, ba_BD2, ba_HD1, ba_HD2)
We see that using BD2 and HD2 increase scatter and increase points beyond the LOAs. This suggests that BD2 and HD2 are not as comprable as BD1 and HD1.
NOTE: this looping below isn’t working because the structure of ‘stats’ does not contain the p-value (even though printing stats does show a p-value). Contacted package author for more assistance. unique_characteristics <- unique(df_long$characteristic)
statistics_list <- lapply(unique_characteristics, function(char) {
# Filter data for the current characteristic (two rows, one for each method) df_char <- df_long %>% filter(characteristic == char)
# Apply blandr.statistics function to the Digital and Hand values stats <- blandr.statistics( x = df_char\(value[df_char\)method == “dig”], y = df_char\(value[df_char\)method == “man”] )
# Extract desired statistics from the results p_value <- stats\(p_value mean_diff <- stats\)bias lower_limit <- stats\(lowerLOA upper_limit <- stats\)upperLOA
# Return a data frame with the statistics for this characteristic return(data.frame( Characteristic = char, p_value = p_value, mean_difference = mean_diff, mean = mean, lower_limit = lower_limit, upper_limit = upper_limit )) })
Literally all (but like HD2 and OL) are significant. HOWEVER, the bias is INCREDIBLY close to zero (suggesting no difference between the methods) and the LOAs include 0 (which also suggest that no difference between the methods is possible).
library(blandr)
## Warning: package 'blandr' was built under R version 4.4.2
### Whole dataset
(bar_full <- blandr.statistics(df_wide$dig, df_wide$man))
## Bland-Altman Statistics
## =======================
## t = -9.7766, df = 3389, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 3390
## Maximum value for average measures: 4.042436
## Minimum value for average measures: 0.3834636
## Maximum value for difference in measures: 0.5562302
## Minimum value for difference in measures: -0.9342704
##
## Bias: -0.03445528
## Standard deviation of bias: 0.2051953
##
## Standard error of bias: 0.003524258
## Standard error for limits of agreement: 0.00602359
##
## Bias: -0.03445528
## Bias- upper 95% CI: -0.02754539
## Bias- lower 95% CI: -0.04136516
##
## Upper limit of agreement: 0.3677276
## Upper LOA- upper 95% CI: 0.3795378
## Upper LOA- lower 95% CI: 0.3559173
##
## Lower limit of agreement: -0.4366381
## Lower LOA- upper 95% CI: -0.4248279
## Lower LOA- lower 95% CI: -0.4484484
##
## =======================
## Derived measures:
## Mean of differences/means: -2.604907
## Point estimate of bias as proportion of lowest average: -8.985281
## Point estimate of bias as proportion of highest average -0.8523395
## Spread of data between lower and upper LoAs: 0.8043657
## Bias as proportion of LoA spread: -4.283534
##
## =======================
## Bias:
## -0.03445528 ( -0.04136516 to -0.02754539 )
## ULoA:
## 0.3677276 ( 0.3559173 to 0.3795378 )
## LLoA:
## -0.4366381 ( -0.4484484 to -0.4248279 )
(bar_full2 <- blandr.statistics(df_wide2$dig, df_wide2$man))
## Bland-Altman Statistics
## =======================
## t = -4.3957, df = 3389, p-value = 1.138e-05
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 3390
## Maximum value for average measures: 4.042436
## Minimum value for average measures: 0.3834636
## Maximum value for difference in measures: 0.746832
## Minimum value for difference in measures: -0.9342704
##
## Bias: -0.01775692
## Standard deviation of bias: 0.2352018
##
## Standard error of bias: 0.004039623
## Standard error for limits of agreement: 0.006904443
##
## Bias: -0.01775692
## Bias- upper 95% CI: -0.009836577
## Bias- lower 95% CI: -0.02567727
##
## Upper limit of agreement: 0.4432387
## Upper LOA- upper 95% CI: 0.456776
## Upper LOA- lower 95% CI: 0.4297014
##
## Lower limit of agreement: -0.4787525
## Lower LOA- upper 95% CI: -0.4652152
## Lower LOA- lower 95% CI: -0.4922898
##
## =======================
## Derived measures:
## Mean of differences/means: -2.042943
## Point estimate of bias as proportion of lowest average: -4.630667
## Point estimate of bias as proportion of highest average -0.4392629
## Spread of data between lower and upper LoAs: 0.9219912
## Bias as proportion of LoA spread: -1.925932
##
## =======================
## Bias:
## -0.01775692 ( -0.02567727 to -0.009836577 )
## ULoA:
## 0.4432387 ( 0.4297014 to 0.456776 )
## LLoA:
## -0.4787525 ( -0.4922898 to -0.4652152 )
### By characteristic
df_SL <- df_wide %>%
filter(characteristic == "SL")
(bar_SL <- blandr.statistics(df_SL$dig, df_SL$man))
## Bland-Altman Statistics
## =======================
## t = 23.922, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 4.042436
## Minimum value for average measures: 2.960844
## Maximum value for difference in measures: 0.1785351
## Minimum value for difference in measures: -0.2129669
##
## Bias: 0.05387797
## Standard deviation of bias: 0.04146884
##
## Standard error of bias: 0.002252278
## Standard error for limits of agreement: 0.003852918
##
## Bias: 0.05387797
## Bias- upper 95% CI: 0.05830822
## Bias- lower 95% CI: 0.04944772
##
## Upper limit of agreement: 0.1351569
## Upper LOA- upper 95% CI: 0.1427356
## Upper LOA- lower 95% CI: 0.1275782
##
## Lower limit of agreement: -0.02740096
## Lower LOA- upper 95% CI: -0.01982224
## Lower LOA- lower 95% CI: -0.03497967
##
## =======================
## Derived measures:
## Mean of differences/means: 1.517899
## Point estimate of bias as proportion of lowest average: 1.819683
## Point estimate of bias as proportion of highest average 1.332809
## Spread of data between lower and upper LoAs: 0.1625579
## Bias as proportion of LoA spread: 33.14387
##
## =======================
## Bias:
## 0.05387797 ( 0.04944772 to 0.05830822 )
## ULoA:
## 0.1351569 ( 0.1275782 to 0.1427356 )
## LLoA:
## -0.02740096 ( -0.03497967 to -0.01982224 )
df_BD1 <- df_wide %>%
filter(characteristic == "BD1")
(bar_BD1 <- blandr.statistics(df_BD1$dig, df_BD1$man))
## Bland-Altman Statistics
## =======================
## t = 30.615, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 3.121966
## Minimum value for average measures: 1.748914
## Maximum value for difference in measures: 0.4951842
## Minimum value for difference in measures: -0.2418473
##
## Bias: 0.1207813
## Standard deviation of bias: 0.07263929
##
## Standard error of bias: 0.003945225
## Standard error for limits of agreement: 0.006749001
##
## Bias: 0.1207813
## Bias- upper 95% CI: 0.1285416
## Bias- lower 95% CI: 0.113021
##
## Upper limit of agreement: 0.2631543
## Upper LOA- upper 95% CI: 0.2764297
## Upper LOA- lower 95% CI: 0.249879
##
## Lower limit of agreement: -0.02159169
## Lower LOA- upper 95% CI: -0.00831636
## Lower LOA- lower 95% CI: -0.03486703
##
## =======================
## Derived measures:
## Mean of differences/means: 4.995667
## Point estimate of bias as proportion of lowest average: 6.906076
## Point estimate of bias as proportion of highest average 3.868758
## Spread of data between lower and upper LoAs: 0.284746
## Bias as proportion of LoA spread: 42.41721
##
## =======================
## Bias:
## 0.1207813 ( 0.113021 to 0.1285416 )
## ULoA:
## 0.2631543 ( 0.249879 to 0.2764297 )
## LLoA:
## -0.02159169 ( -0.03486703 to -0.00831636 )
df_BD2 <- df_wide2 %>%
filter(characteristic == "BD")
(bar_BD2 <- blandr.statistics(df_BD2$dig, df_BD2$man))
## Bland-Altman Statistics
## =======================
## t = 52.87, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 3.163316
## Minimum value for average measures: 1.876811
## Maximum value for difference in measures: 0.746832
## Minimum value for difference in measures: -0.09538079
##
## Bias: 0.3557936
## Standard deviation of bias: 0.1239048
##
## Standard error of bias: 0.006729585
## Standard error for limits of agreement: 0.01151214
##
## Bias: 0.3557936
## Bias- upper 95% CI: 0.3690308
## Bias- lower 95% CI: 0.3425565
##
## Upper limit of agreement: 0.598647
## Upper LOA- upper 95% CI: 0.6212915
## Upper LOA- lower 95% CI: 0.5760026
##
## Lower limit of agreement: 0.1129402
## Lower LOA- upper 95% CI: 0.1355847
## Lower LOA- lower 95% CI: 0.09029573
##
## =======================
## Derived measures:
## Mean of differences/means: 14.04186
## Point estimate of bias as proportion of lowest average: 18.95735
## Point estimate of bias as proportion of highest average 11.24749
## Spread of data between lower and upper LoAs: 0.4857068
## Bias as proportion of LoA spread: 73.25275
##
## =======================
## Bias:
## 0.3557936 ( 0.3425565 to 0.3690308 )
## ULoA:
## 0.598647 ( 0.5760026 to 0.6212915 )
## LLoA:
## 0.1129402 ( 0.09029573 to 0.1355847 )
df_CPD <- df_wide %>%
filter(characteristic == "CPD")
(bar_CPD <- blandr.statistics(df_CPD$dig, df_CPD$man))
## Bland-Altman Statistics
## =======================
## t = 27.399, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.471218
## Minimum value for average measures: 1.170403
## Maximum value for difference in measures: 0.5562302
## Minimum value for difference in measures: -0.1550393
##
## Bias: 0.101579
## Standard deviation of bias: 0.06825963
##
## Standard error of bias: 0.003707354
## Standard error for limits of agreement: 0.006342082
##
## Bias: 0.101579
## Bias- upper 95% CI: 0.1088714
## Bias- lower 95% CI: 0.09428661
##
## Upper limit of agreement: 0.2353679
## Upper LOA- upper 95% CI: 0.2478428
## Upper LOA- lower 95% CI: 0.222893
##
## Lower limit of agreement: -0.03220987
## Lower LOA- upper 95% CI: -0.01973495
## Lower LOA- lower 95% CI: -0.04468479
##
## =======================
## Derived measures:
## Mean of differences/means: 5.715623
## Point estimate of bias as proportion of lowest average: 8.678974
## Point estimate of bias as proportion of highest average 4.110483
## Spread of data between lower and upper LoAs: 0.2675778
## Bias as proportion of LoA spread: 37.96243
##
## =======================
## Bias:
## 0.101579 ( 0.09428661 to 0.1088714 )
## ULoA:
## 0.2353679 ( 0.222893 to 0.2478428 )
## LLoA:
## -0.03220987 ( -0.04468479 to -0.01973495 )
df_CPL <- df_wide %>%
filter(characteristic == "CPL")
(bar_CPL <- blandr.statistics(df_CPL$dig, df_CPL$man))
## Bland-Altman Statistics
## =======================
## t = 30.741, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.890711
## Minimum value for average measures: 1.76216
## Maximum value for difference in measures: 0.3986366
## Minimum value for difference in measures: -0.19231
##
## Bias: 0.1409026
## Standard deviation of bias: 0.08439069
##
## Standard error of bias: 0.004583473
## Standard error for limits of agreement: 0.007840837
##
## Bias: 0.1409026
## Bias- upper 95% CI: 0.1499183
## Bias- lower 95% CI: 0.1318869
##
## Upper limit of agreement: 0.3063084
## Upper LOA- upper 95% CI: 0.3217314
## Upper LOA- lower 95% CI: 0.2908854
##
## Lower limit of agreement: -0.02450314
## Lower LOA- upper 95% CI: -0.00908016
## Lower LOA- lower 95% CI: -0.03992613
##
## =======================
## Derived measures:
## Mean of differences/means: 5.963766
## Point estimate of bias as proportion of lowest average: 7.996017
## Point estimate of bias as proportion of highest average 4.874324
## Spread of data between lower and upper LoAs: 0.3308115
## Bias as proportion of LoA spread: 42.59302
##
## =======================
## Bias:
## 0.1409026 ( 0.1318869 to 0.1499183 )
## ULoA:
## 0.3063084 ( 0.2908854 to 0.3217314 )
## LLoA:
## -0.02450314 ( -0.03992613 to -0.00908016 )
df_PreDL <- df_wide %>%
filter(characteristic == "PreDL")
(bar_PreDL <- blandr.statistics(df_PreDL$dig, df_PreDL$man))
## Bland-Altman Statistics
## =======================
## t = -75.204, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 3.204808
## Minimum value for average measures: 2.05628
## Maximum value for difference in measures: 0.2376452
## Minimum value for difference in measures: -0.9116021
##
## Bias: -0.4623168
## Standard deviation of bias: 0.113188
##
## Standard error of bias: 0.006147531
## Standard error for limits of agreement: 0.01051643
##
## Bias: -0.4623168
## Bias- upper 95% CI: -0.4502246
## Bias- lower 95% CI: -0.4744091
##
## Upper limit of agreement: -0.2404683
## Upper LOA- upper 95% CI: -0.2197824
## Upper LOA- lower 95% CI: -0.2611542
##
## Lower limit of agreement: -0.6841654
## Lower LOA- upper 95% CI: -0.6634795
## Lower LOA- lower 95% CI: -0.7048513
##
## =======================
## Derived measures:
## Mean of differences/means: -17.19129
## Point estimate of bias as proportion of lowest average: -22.48317
## Point estimate of bias as proportion of highest average -14.42572
## Spread of data between lower and upper LoAs: 0.4436971
## Bias as proportion of LoA spread: -104.1965
##
## =======================
## Bias:
## -0.4623168 ( -0.4744091 to -0.4502246 )
## ULoA:
## -0.2404683 ( -0.2611542 to -0.2197824 )
## LLoA:
## -0.6841654 ( -0.7048513 to -0.6634795 )
df_DbL <- df_wide %>%
filter(characteristic == "DbL")
(bar_DbL <- blandr.statistics(df_DbL$dig, df_DbL$man))
## Bland-Altman Statistics
## =======================
## t = -2.8434, df = 338, p-value = 0.004736
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.786619
## Minimum value for average measures: 1.117524
## Maximum value for difference in measures: 0.2886718
## Minimum value for difference in measures: -0.9342704
##
## Bias: -0.01723212
## Standard deviation of bias: 0.1115855
##
## Standard error of bias: 0.006060494
## Standard error for limits of agreement: 0.01036754
##
## Bias: -0.01723212
## Bias- upper 95% CI: -0.005311085
## Bias- lower 95% CI: -0.02915316
##
## Upper limit of agreement: 0.2014755
## Upper LOA- upper 95% CI: 0.2218686
## Upper LOA- lower 95% CI: 0.1810825
##
## Lower limit of agreement: -0.2359398
## Lower LOA- upper 95% CI: -0.2155467
## Lower LOA- lower 95% CI: -0.2563328
##
## =======================
## Derived measures:
## Mean of differences/means: -1.003452
## Point estimate of bias as proportion of lowest average: -1.541991
## Point estimate of bias as proportion of highest average -0.6183882
## Spread of data between lower and upper LoAs: 0.4374153
## Bias as proportion of LoA spread: -3.939533
##
## =======================
## Bias:
## -0.01723212 ( -0.02915316 to -0.005311085 )
## ULoA:
## 0.2014755 ( 0.1810825 to 0.2218686 )
## LLoA:
## -0.2359398 ( -0.2563328 to -0.2155467 )
df_HL <- df_wide %>%
filter(characteristic == "HL")
(bar_HL <- blandr.statistics(df_HL$dig, df_HL$man))
## Bland-Altman Statistics
## =======================
## t = -16.151, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.565974
## Minimum value for average measures: 1.480514
## Maximum value for difference in measures: 0.528404
## Minimum value for difference in measures: -0.7045543
##
## Bias: -0.1433821
## Standard deviation of bias: 0.163451
##
## Standard error of bias: 0.008877438
## Standard error for limits of agreement: 0.01518642
##
## Bias: -0.1433821
## Bias- upper 95% CI: -0.1259201
## Bias- lower 95% CI: -0.1608441
##
## Upper limit of agreement: 0.1769818
## Upper LOA- upper 95% CI: 0.2068536
## Upper LOA- lower 95% CI: 0.14711
##
## Lower limit of agreement: -0.463746
## Lower LOA- upper 95% CI: -0.4338742
## Lower LOA- lower 95% CI: -0.4936178
##
## =======================
## Derived measures:
## Mean of differences/means: -7.262377
## Point estimate of bias as proportion of lowest average: -9.684617
## Point estimate of bias as proportion of highest average -5.587823
## Spread of data between lower and upper LoAs: 0.6407278
## Bias as proportion of LoA spread: -22.378
##
## =======================
## Bias:
## -0.1433821 ( -0.1608441 to -0.1259201 )
## ULoA:
## 0.1769818 ( 0.14711 to 0.2068536 )
## LLoA:
## -0.463746 ( -0.4936178 to -0.4338742 )
df_HD1 <- df_wide %>%
filter(characteristic == "HD1")
(bar_HD1 <- blandr.statistics(df_HD1$dig, df_HD1$man))
## Bland-Altman Statistics
## =======================
## t = 14.044, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.601825
## Minimum value for average measures: 1.451618
## Maximum value for difference in measures: 0.4229753
## Minimum value for difference in measures: -0.2031489
##
## Bias: 0.06017856
## Standard deviation of bias: 0.07889485
##
## Standard error of bias: 0.00428498
## Standard error for limits of agreement: 0.007330212
##
## Bias: 0.06017856
## Bias- upper 95% CI: 0.06860715
## Bias- lower 95% CI: 0.05174997
##
## Upper limit of agreement: 0.2148125
## Upper LOA- upper 95% CI: 0.229231
## Upper LOA- lower 95% CI: 0.2003939
##
## Lower limit of agreement: -0.09445534
## Lower LOA- upper 95% CI: -0.08003676
## Lower LOA- lower 95% CI: -0.1088739
##
## =======================
## Derived measures:
## Mean of differences/means: 3.007816
## Point estimate of bias as proportion of lowest average: 4.14562
## Point estimate of bias as proportion of highest average 2.312937
## Spread of data between lower and upper LoAs: 0.3092678
## Bias as proportion of LoA spread: 19.4584
##
## =======================
## Bias:
## 0.06017856 ( 0.05174997 to 0.06860715 )
## ULoA:
## 0.2148125 ( 0.2003939 to 0.229231 )
## LLoA:
## -0.09445534 ( -0.1088739 to -0.08003676 )
df_HD2 <- df_wide2 %>%
filter(characteristic == "HD")
(bar_HD2 <- blandr.statistics(df_HD2$dig, df_HD2$man))
## Bland-Altman Statistics
## =======================
## t = -1.3693, df = 338, p-value = 0.1718
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.566481
## Minimum value for average measures: 1.391822
## Maximum value for difference in measures: 0.4478337
## Minimum value for difference in measures: -0.2905665
##
## Bias: -0.007850176
## Standard deviation of bias: 0.1055546
##
## Standard error of bias: 0.005732938
## Standard error for limits of agreement: 0.009807199
##
## Bias: -0.007850176
## Bias- upper 95% CI: 0.003426556
## Bias- lower 95% CI: -0.01912691
##
## Upper limit of agreement: 0.1990368
## Upper LOA- upper 95% CI: 0.2183276
## Upper LOA- lower 95% CI: 0.179746
##
## Lower limit of agreement: -0.2147372
## Lower LOA- upper 95% CI: -0.1954463
## Lower LOA- lower 95% CI: -0.234028
##
## =======================
## Derived measures:
## Mean of differences/means: -0.4187308
## Point estimate of bias as proportion of lowest average: -0.5640215
## Point estimate of bias as proportion of highest average -0.3058731
## Spread of data between lower and upper LoAs: 0.413774
## Bias as proportion of LoA spread: -1.897213
##
## =======================
## Bias:
## -0.007850176 ( -0.01912691 to 0.003426556 )
## ULoA:
## 0.1990368 ( 0.179746 to 0.2183276 )
## LLoA:
## -0.2147372 ( -0.234028 to -0.1954463 )
df_SnL <- df_wide %>%
filter(characteristic == "SnL")
(bar_SnL <- blandr.statistics(df_SnL$dig, df_SnL$man))
## Bland-Altman Statistics
## =======================
## t = -22.082, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 1.547597
## Minimum value for average measures: 0.3834636
## Maximum value for difference in measures: 0.2909172
## Minimum value for difference in measures: -0.7671152
##
## Bias: -0.1897858
## Standard deviation of bias: 0.158242
##
## Standard error of bias: 0.008594523
## Standard error for limits of agreement: 0.01470244
##
## Bias: -0.1897858
## Bias- upper 95% CI: -0.1728803
## Bias- lower 95% CI: -0.2066913
##
## Upper limit of agreement: 0.1203684
## Upper LOA- upper 95% CI: 0.1492882
## Upper LOA- lower 95% CI: 0.0914486
##
## Lower limit of agreement: -0.49994
## Lower LOA- upper 95% CI: -0.4710202
## Lower LOA- lower 95% CI: -0.5288599
##
## =======================
## Derived measures:
## Mean of differences/means: -21.11482
## Point estimate of bias as proportion of lowest average: -49.49253
## Point estimate of bias as proportion of highest average -12.26326
## Spread of data between lower and upper LoAs: 0.6203085
## Bias as proportion of LoA spread: -30.59539
##
## =======================
## Bias:
## -0.1897858 ( -0.2066913 to -0.1728803 )
## ULoA:
## 0.1203684 ( 0.0914486 to 0.1492882 )
## LLoA:
## -0.49994 ( -0.5288599 to -0.4710202 )
df_OL <- df_wide %>%
filter(characteristic == "OL")
(bar_OL <- blandr.statistics(df_OL$dig, df_OL$man))
## Bland-Altman Statistics
## =======================
## t = -1.7645, df = 338, p-value = 0.07855
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 1.471467
## Minimum value for average measures: 0.6198813
## Maximum value for difference in measures: 0.4116851
## Minimum value for difference in measures: -0.3320279
##
## Bias: -0.009155384
## Standard deviation of bias: 0.09553187
##
## Standard error of bias: 0.005188579
## Standard error for limits of agreement: 0.008875977
##
## Bias: -0.009155384
## Bias- upper 95% CI: 0.001050588
## Bias- lower 95% CI: -0.01936136
##
## Upper limit of agreement: 0.1780871
## Upper LOA- upper 95% CI: 0.1955462
## Upper LOA- lower 95% CI: 0.160628
##
## Lower limit of agreement: -0.1963978
## Lower LOA- upper 95% CI: -0.1789387
## Lower LOA- lower 95% CI: -0.213857
##
## =======================
## Derived measures:
## Mean of differences/means: -0.6778975
## Point estimate of bias as proportion of lowest average: -1.476958
## Point estimate of bias as proportion of highest average -0.6221944
## Spread of data between lower and upper LoAs: 0.3744849
## Bias as proportion of LoA spread: -2.444794
##
## =======================
## Bias:
## -0.009155384 ( -0.01936136 to 0.001050588 )
## ULoA:
## 0.1780871 ( 0.160628 to 0.1955462 )
## LLoA:
## -0.1963978 ( -0.213857 to -0.1789387 )
df_predl <- df_wide %>%
filter(characteristic == "PreDL")
# Separate Digital and Hand values
digital_values <- df_predl$dig
hand_values <- df_predl$man
# Apply the blandr.statistics function
stats <- blandr.statistics(method1 = digital_values, method2 = hand_values)
# Extract the lower and upper limits of agreement
lower_limit <- stats$upperLOA
upper_limit <- stats$lowerLOA
# Calculate differences and filter for outliers
df_predl <- df_predl %>%
mutate(diff = dig - man) %>%
filter(diff < lower_limit | diff > upper_limit)
# View the outliers for PreDL
print(df_predl)
## # A tibble: 339 × 6
## ID SPP characteristic dig man diff
## <fct> <fct> <fct> <dbl> <dbl> <dbl>
## 1 AD19-001 p.lat PreDL 2.72 3.10 -0.384
## 2 AD19-002 p.lat PreDL 2.53 3.04 -0.506
## 3 AD19-003 p.lat PreDL 2.82 3.25 -0.433
## 4 AD19-004 p.lat PreDL 2.33 2.83 -0.500
## 5 AD19-005 p.lat PreDL 2.64 3.05 -0.406
## 6 AD19-006 p.lat PreDL 2.62 3.02 -0.405
## 7 AD19-007 p.lat PreDL 2.44 2.91 -0.466
## 8 AD19-008 p.lat PreDL 2.45 2.86 -0.412
## 9 AD19-009 p.lat PreDL 2.33 2.91 -0.574
## 10 AD19-010 p.lat PreDL 2.63 3.04 -0.414
## # ℹ 329 more rows
cat("Mean Difference:", mean(df_predl$diff), "\n")
## Mean Difference: -0.4623168
cat("Lower Limit:", lower_limit, "\n")
## Lower Limit: -0.2404683
cat("Upper Limit:", upper_limit, "\n")
## Upper Limit: -0.6841654
df_PreDL <- df_predl %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
ba_PreDL <- ggplot(df_PreDL, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_PreDL
library(blandr)
blandr.statistics(df_PreDL$dig, df_PreDL$man)
## Bland-Altman Statistics
## =======================
## t = -75.204, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 3.204808
## Minimum value for average measures: 2.05628
## Maximum value for difference in measures: 0.2376452
## Minimum value for difference in measures: -0.9116021
##
## Bias: -0.4623168
## Standard deviation of bias: 0.113188
##
## Standard error of bias: 0.006147531
## Standard error for limits of agreement: 0.01051643
##
## Bias: -0.4623168
## Bias- upper 95% CI: -0.4502246
## Bias- lower 95% CI: -0.4744091
##
## Upper limit of agreement: -0.2404683
## Upper LOA- upper 95% CI: -0.2197824
## Upper LOA- lower 95% CI: -0.2611542
##
## Lower limit of agreement: -0.6841654
## Lower LOA- upper 95% CI: -0.6634795
## Lower LOA- lower 95% CI: -0.7048513
##
## =======================
## Derived measures:
## Mean of differences/means: -17.19129
## Point estimate of bias as proportion of lowest average: -22.48317
## Point estimate of bias as proportion of highest average -14.42572
## Spread of data between lower and upper LoAs: 0.4436971
## Bias as proportion of LoA spread: -104.1965
##
## =======================
## Bias:
## -0.4623168 ( -0.4744091 to -0.4502246 )
## ULoA:
## -0.2404683 ( -0.2611542 to -0.2197824 )
## LLoA:
## -0.6841654 ( -0.7048513 to -0.6634795 )
In this analysis, I will compare the principle components after centering and scaling the data. A PCA analysis will help us determine which methodology influence the variation in our data the most without worrying about differences in scales/measurements.
library(stringr)
z.scores <- raw2
z.scores[, c(3:24)] <- scale(z.scores[, 3:24], center = TRUE, scale = TRUE)
PCA <- prcomp(z.scores[, 3:24])
summary(PCA)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 4.182 1.21217 0.84996 0.7357 0.56329 0.5180 0.45818
## Proportion of Variance 0.795 0.06679 0.03284 0.0246 0.01442 0.0122 0.00954
## Cumulative Proportion 0.795 0.86180 0.89463 0.9192 0.93366 0.9458 0.95540
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.42037 0.41587 0.33673 0.32661 0.30076 0.28121 0.25212
## Proportion of Variance 0.00803 0.00786 0.00515 0.00485 0.00411 0.00359 0.00289
## Cumulative Proportion 0.96343 0.97129 0.97644 0.98129 0.98540 0.98900 0.99189
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.22165 0.19780 0.16764 0.16404 0.12458 0.10221 0.07463
## Proportion of Variance 0.00223 0.00178 0.00128 0.00122 0.00071 0.00047 0.00025
## Cumulative Proportion 0.99412 0.99590 0.99718 0.99840 0.99911 0.99958 0.99983
## PC22
## Standard deviation 0.06048
## Proportion of Variance 0.00017
## Cumulative Proportion 1.00000
(loadings <- PCA$rotation[, 1:5])
## PC1 PC2 PC3 PC4 PC5
## SL.dig -0.2347778 -0.096996721 0.01693687 -0.006176149 0.062160581
## BD1.dig -0.2284302 0.040438320 0.11828459 -0.082195318 -0.020081064
## BD2.dig -0.2265082 -0.204230055 0.04686845 -0.114485959 0.014698263
## CPD.dig -0.2331491 0.027815265 0.02843859 0.060726336 0.028640808
## CPL.dig -0.2233653 -0.122526759 -0.01765293 0.051407052 0.087380963
## PreDL.dig -0.1795926 -0.455754637 0.38328152 -0.035114502 0.036990294
## DbL.dig -0.1792735 0.492804564 0.15433728 0.093722366 0.132645722
## HL.dig -0.1881629 0.037841635 -0.69302734 -0.129545756 0.040276382
## HD1.dig -0.2314303 0.092691464 -0.15121241 -0.074290038 -0.049191518
## HD2.dig -0.2202611 0.126740445 -0.37372814 -0.114658545 -0.058170881
## SnL.dig -0.1887660 -0.209724253 -0.23206376 0.438310297 0.299520788
## OL.dig -0.2176019 -0.051613729 -0.08223996 -0.187395317 -0.246012805
## SL.man -0.2346467 -0.066007194 0.05419313 -0.013410667 0.043012548
## BD1.man -0.2233880 0.137447421 0.15671013 0.048596434 -0.036769031
## CPD.man -0.2275129 0.051199268 0.08915917 0.048228291 0.001670173
## CPL.man -0.2229489 -0.084361388 0.02254979 0.033420529 0.088728475
## PreDL.man -0.2160456 -0.280047719 0.02145309 -0.055310926 0.063545028
## DbL.man -0.1679576 0.536687777 0.19654399 0.059968129 0.115261547
## HL.man -0.2113179 0.033215809 0.05468238 -0.198535844 0.394539908
## HD1.man -0.2253717 0.081537288 0.14871885 -0.052250635 0.089215660
## SnL.man -0.1857245 -0.004219492 -0.02498576 0.725115394 -0.468553383
## OL.man -0.2025594 0.030649519 0.08639455 -0.347119680 -0.631563787
sorted.loadings.1 <- loadings[order(loadings[, 1]), 1]
myTitle <- "Loadings Plot for PC1"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.1, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.2 <- loadings[order(loadings[, 2]), 2]
myTitle <- "Loadings Plot for PC2"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.2, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.3 <- loadings[order(loadings[, 3]), 3]
myTitle <- "Loadings Plot for PC3"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")
VM_PCA <- varimax(PCA$rotation[, 1:3])
VM_PCA
## $loadings
##
## Loadings:
## PC1 PC2 PC3
## SL.dig -0.116 -0.220
## BD1.dig -0.235 -0.112
## BD2.dig -0.302
## CPD.dig -0.190 -0.120
## CPL.dig -0.231
## PreDL.dig -0.493 0.376
## DbL.dig -0.469 0.276
## HL.dig 0.168 -0.698
## HD1.dig -0.142 -0.248
## HD2.dig -0.449
## SnL.dig -0.267 -0.231
## OL.dig -0.167 -0.147
## SL.man -0.151 -0.197
## BD1.man -0.303
## CPD.man -0.227 -0.101
## CPL.man -0.118 -0.203
## PreDL.man -0.354
## DbL.man -0.505 0.315
## HL.man -0.190 -0.104
## HD1.man -0.270
## SnL.man -0.115 -0.114
## OL.man -0.197 -0.102
##
## PC1 PC2 PC3
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.045 0.045 0.045
## Cumulative Var 0.045 0.091 0.136
##
## $rotmat
## [,1] [,2] [,3]
## [1,] 0.6917396 0.60222957 0.3985170
## [2,] -0.5560565 0.79629540 -0.2381487
## [3,] -0.4607575 -0.05686105 0.8857027
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
(eig.val <- get_eigenvalue(PCA))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 17.490167246 79.50076021 79.50076
## Dim.2 1.469344948 6.67884067 86.17960
## Dim.3 0.722436480 3.28380218 89.46340
## Dim.4 0.541231220 2.46014191 91.92354
## Dim.5 0.317290132 1.44222787 93.36577
## Dim.6 0.268321832 1.21964469 94.58542
## Dim.7 0.209931770 0.95423532 95.53965
## Dim.8 0.176710702 0.80323046 96.34288
## Dim.9 0.172949491 0.78613405 97.12902
## Dim.10 0.113385668 0.51538940 97.64441
## Dim.11 0.106671751 0.48487160 98.12928
## Dim.12 0.090458276 0.41117398 98.54045
## Dim.13 0.079078144 0.35944611 98.89990
## Dim.14 0.063562571 0.28892078 99.18882
## Dim.15 0.049128830 0.22331286 99.41213
## Dim.16 0.039123834 0.17783561 99.58997
## Dim.17 0.028104434 0.12774743 99.71772
## Dim.18 0.026908568 0.12231167 99.84003
## Dim.19 0.015519246 0.07054203 99.91057
## Dim.20 0.010446972 0.04748624 99.95806
## Dim.21 0.005570293 0.02531951 99.98337
## Dim.22 0.003657591 0.01662541 100.00000
ind <- get_pca_ind(PCA)
head(ind$coord[,1:4])
## Dim.1 Dim.2 Dim.3 Dim.4
## 1 -5.4067751 0.4298672 0.4151608 0.68096887
## 2 -5.9873007 1.2851793 -0.4499961 0.35356493
## 3 -8.8447937 0.4505076 0.3553199 0.79382477
## 4 0.3842836 1.0995520 -0.4421054 0.15666553
## 5 -3.6094489 0.6935377 0.4998015 -0.39342915
## 6 -3.9868310 0.8667640 0.2054317 -0.03320378
raw.p <- cbind(z.scores, ind$coord[,1:4])
In order to compare dimensions 1-4 between manual and digital measurements, I need to actually run a PCA on digital only and manual only data sets, then perform a paired t-test.
Let’s create the data sets
dig_p_long <- z.scores %>%
select(ID, SPP, ends_with(".dig"))
man_p_long <- z.scores %>%
select(ID, SPP, ends_with(".man"))
Now let’s run the individual PCAs
PCA_dig <- prcomp(dig_p_long[, 3:14])
summary(PCA_dig)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.1141 0.97826 0.74710 0.56729 0.41175 0.34804 0.25763
## Proportion of Variance 0.8081 0.07975 0.04651 0.02682 0.01413 0.01009 0.00553
## Cumulative Proportion 0.8081 0.88787 0.93438 0.96120 0.97533 0.98542 0.99096
## PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.21796 0.19369 0.11394 0.07637 0.06855
## Proportion of Variance 0.00396 0.00313 0.00108 0.00049 0.00039
## Cumulative Proportion 0.99491 0.99804 0.99912 0.99961 1.00000
(loadings_dig <- PCA_dig$rotation[, 1:5])
## PC1 PC2 PC3 PC4 PC5
## SL.dig -0.3169909 -0.10611458 0.060604567 -0.02354075 0.18176533
## BD1.dig -0.3049862 0.01844403 0.250670038 -0.08937686 -0.10696578
## BD2.dig -0.3063687 -0.22926027 0.015047023 -0.21325192 0.09067490
## CPD.dig -0.3113160 0.02702292 0.134490705 0.06974432 0.10056312
## CPL.dig -0.3033025 -0.12560059 0.020502291 -0.01630232 0.47718294
## PreDL.dig -0.2422550 -0.64531966 0.171752342 -0.09719333 0.02497410
## DbL.dig -0.2321796 0.50569082 0.572575906 0.32709467 0.04365808
## HL.dig -0.2625901 0.30850000 -0.607101208 -0.21853627 0.16435216
## HD1.dig -0.3120389 0.17140191 0.004180317 -0.07011570 -0.05451678
## HD2.dig -0.2995263 0.29442543 -0.204803116 -0.16644442 0.01982239
## SnL.dig -0.2592646 -0.18806057 -0.382995034 0.84022275 -0.14244593
## OL.dig -0.2964414 -0.02385058 -0.014007596 -0.19509333 -0.81011359
dig.sorted.loadings.1 <- loadings_dig[order(loadings_dig[, 1]), 1]
dig.myTitle <- "Loadings Plot for PC1"
dig.myXlab <- "Variable Loadings"
dotchart(dig.sorted.loadings.1, main=dig.myTitle, xlab=dig.myXlab, cex=1.5, col="red")
dig.sorted.loadings.2 <- loadings_dig[order(loadings_dig[, 2]), 2]
dig.myTitle <- "Loadings Plot for PC2"
dig.myXlab <- "Variable Loadings"
dotchart(dig.sorted.loadings.2, main=dig.myTitle, xlab=dig.myXlab, cex=1.5, col="red")
dig.sorted.loadings.3 <- loadings_dig[order(loadings_dig[, 3]), 3]
dig.myTitle <- "Loadings Plot for PC3"
dig.myXlab <- "Variable Loadings"
dotchart(dig.sorted.loadings.3, main=dig.myTitle, xlab=dig.myXlab, cex=1.5, col="red")
dig.VM_PCA <- varimax(PCA$rotation[, 1:3])
dig.VM_PCA
## $loadings
##
## Loadings:
## PC1 PC2 PC3
## SL.dig -0.116 -0.220
## BD1.dig -0.235 -0.112
## BD2.dig -0.302
## CPD.dig -0.190 -0.120
## CPL.dig -0.231
## PreDL.dig -0.493 0.376
## DbL.dig -0.469 0.276
## HL.dig 0.168 -0.698
## HD1.dig -0.142 -0.248
## HD2.dig -0.449
## SnL.dig -0.267 -0.231
## OL.dig -0.167 -0.147
## SL.man -0.151 -0.197
## BD1.man -0.303
## CPD.man -0.227 -0.101
## CPL.man -0.118 -0.203
## PreDL.man -0.354
## DbL.man -0.505 0.315
## HL.man -0.190 -0.104
## HD1.man -0.270
## SnL.man -0.115 -0.114
## OL.man -0.197 -0.102
##
## PC1 PC2 PC3
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.045 0.045 0.045
## Cumulative Var 0.045 0.091 0.136
##
## $rotmat
## [,1] [,2] [,3]
## [1,] 0.6917396 0.60222957 0.3985170
## [2,] -0.5560565 0.79629540 -0.2381487
## [3,] -0.4607575 -0.05686105 0.8857027
(dig.eig.val <- get_eigenvalue(PCA_dig))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 9.697460424 80.81217020 80.81217
## Dim.2 0.956988897 7.97490747 88.78708
## Dim.3 0.558155419 4.65129516 93.43837
## Dim.4 0.321819658 2.68183049 96.12020
## Dim.5 0.169536257 1.41280214 97.53301
## Dim.6 0.121130526 1.00942105 98.54243
## Dim.7 0.066371341 0.55309451 99.09552
## Dim.8 0.047504771 0.39587309 99.49139
## Dim.9 0.037517676 0.31264730 99.80404
## Dim.10 0.012983044 0.10819203 99.91223
## Dim.11 0.005832920 0.04860766 99.96084
## Dim.12 0.004699067 0.03915889 100.00000
dig.ind <- get_pca_ind(PCA_dig)
head(dig.ind$coord[,1:4])
## Dim.1 Dim.2 Dim.3 Dim.4
## 1 -3.877405 -0.1360907 0.21810368 0.6037004
## 2 -4.350838 1.0993666 0.01943977 0.6087639
## 3 -6.089192 0.1481213 0.42598106 0.5018142
## 4 0.215744 0.9684140 -0.13146684 0.9619029
## 5 -2.633685 0.2573236 0.58017122 0.2108573
## 6 -3.078674 0.3356335 0.19729525 0.6931265
dig.p <- cbind(dig_p_long, dig.ind$coord[,1:4])
PCA_man <- prcomp(man_p_long[, 3:12])
summary(PCA_man)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.8377 0.80736 0.67359 0.5225 0.42289 0.34999 0.30348
## Proportion of Variance 0.8052 0.06518 0.04537 0.0273 0.01788 0.01225 0.00921
## Cumulative Proportion 0.8052 0.87042 0.91580 0.9431 0.96098 0.97323 0.98244
## PC8 PC9 PC10
## Standard deviation 0.28419 0.27405 0.14043
## Proportion of Variance 0.00808 0.00751 0.00197
## Cumulative Proportion 0.99052 0.99803 1.00000
(loadings_man <- PCA_man$rotation[, 1:5])
## PC1 PC2 PC3 PC4 PC5
## SL.man -0.3448760 -0.159264090 0.028818596 0.08348653 0.17222913
## BD1.man -0.3336431 0.171772379 -0.041810129 -0.02260661 0.23746504
## CPD.man -0.3381457 0.020375133 -0.022859381 0.02968760 0.21184581
## CPL.man -0.3287681 -0.213919604 -0.004339041 0.15099603 0.37304666
## PreDL.man -0.3128472 -0.481495839 0.092930271 0.10276539 0.13586031
## DbL.man -0.2603841 0.806413535 -0.039923819 0.07691051 0.12592852
## HL.man -0.3152071 0.005916009 0.329640553 0.42089544 -0.72541972
## HD1.man -0.3367866 0.089004745 0.107331634 0.10025382 -0.06396107
## SnL.man -0.2773374 -0.093007562 -0.874263032 -0.13518487 -0.35690086
## OL.man -0.3032190 -0.014288942 0.319579808 -0.86422398 -0.19464322
man.sorted.loadings.1 <- loadings_man[order(loadings_man[, 1]), 1]
man.myTitle <- "Loadings Plot for PC1"
man.myXlab <- "Variable Loadings"
dotchart(man.sorted.loadings.1, main=man.myTitle, xlab=man.myXlab, cex=1.5, col="red")
man.sorted.loadings.2 <- loadings_man[order(loadings_man[, 2]), 2]
man.myTitle <- "Loadings Plot for PC2"
man.myXlab <- "Variable Loadings"
dotchart(man.sorted.loadings.2, main=man.myTitle, xlab=man.myXlab, cex=1.5, col="red")
man.sorted.loadings.3 <- loadings_man[order(loadings_man[, 3]), 3]
man.myTitle <- "Loadings Plot for PC3"
man.myXlab <- "Variable Loadings"
dotchart(man.sorted.loadings.3, main=man.myTitle, xlab=man.myXlab, cex=1.5, col="red")
man.VM_PCA <- varimax(PCA$rotation[, 1:3])
man.VM_PCA
## $loadings
##
## Loadings:
## PC1 PC2 PC3
## SL.dig -0.116 -0.220
## BD1.dig -0.235 -0.112
## BD2.dig -0.302
## CPD.dig -0.190 -0.120
## CPL.dig -0.231
## PreDL.dig -0.493 0.376
## DbL.dig -0.469 0.276
## HL.dig 0.168 -0.698
## HD1.dig -0.142 -0.248
## HD2.dig -0.449
## SnL.dig -0.267 -0.231
## OL.dig -0.167 -0.147
## SL.man -0.151 -0.197
## BD1.man -0.303
## CPD.man -0.227 -0.101
## CPL.man -0.118 -0.203
## PreDL.man -0.354
## DbL.man -0.505 0.315
## HL.man -0.190 -0.104
## HD1.man -0.270
## SnL.man -0.115 -0.114
## OL.man -0.197 -0.102
##
## PC1 PC2 PC3
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.045 0.045 0.045
## Cumulative Var 0.045 0.091 0.136
##
## $rotmat
## [,1] [,2] [,3]
## [1,] 0.6917396 0.60222957 0.3985170
## [2,] -0.5560565 0.79629540 -0.2381487
## [3,] -0.4607575 -0.05686105 0.8857027
library(factoextra)
(man.eig.val <- get_eigenvalue(PCA_man))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 8.05239960 80.5239960 80.52400
## Dim.2 0.65182332 6.5182332 87.04223
## Dim.3 0.45372921 4.5372921 91.57952
## Dim.4 0.27302458 2.7302458 94.30977
## Dim.5 0.17883881 1.7883881 96.09816
## Dim.6 0.12249607 1.2249607 97.32312
## Dim.7 0.09210306 0.9210306 98.24415
## Dim.8 0.08076290 0.8076290 99.05178
## Dim.9 0.07510089 0.7510089 99.80278
## Dim.10 0.01972155 0.1972155 100.00000
man.ind <- get_pca_ind(PCA_man)
head(man.ind$coord[,1:4])
## Dim.1 Dim.2 Dim.3 Dim.4
## 1 -3.7784207 0.6451182 -0.396593632 0.06849222
## 2 -4.1217187 0.6603292 0.008902533 0.87860206
## 3 -6.4466569 0.1376030 -0.378551677 0.95660998
## 4 0.3309617 0.6450504 0.446021000 1.18793061
## 5 -2.4644018 0.5773132 0.605969577 0.29865214
## 6 -2.5445684 0.9381672 0.511011644 0.61730956
man.p <- cbind(man_p_long, man.ind$coord[,1:4])
Now let’s do the t-tests
(pc1 <- t.test(dig.p$Dim.1, man.p$Dim.1, paired= TRUE, alternative = "two.sided"))
##
## Paired t-test
##
## data: dig.p$Dim.1 and man.p$Dim.1
## t = 7.2114e-16, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.08315722 0.08315722
## sample estimates:
## mean difference
## 3.048687e-17
(pc2 <- t.test(dig.p$Dim.2, man.p$Dim.2, paired= TRUE, alternative = "two.sided"))
##
## Paired t-test
##
## data: dig.p$Dim.2 and man.p$Dim.2
## t = -2.1474e-15, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.07233049 0.07233049
## sample estimates:
## mean difference
## -7.896446e-17
(pc3 <- t.test(dig.p$Dim.3, man.p$Dim.3, paired= TRUE, alternative = "two.sided"))
##
## Paired t-test
##
## data: dig.p$Dim.3 and man.p$Dim.3
## t = 4.112e-17, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.1053799 0.1053799
## sample estimates:
## mean difference
## 2.202945e-18
library(AMR)
library(ggplot2)
library(ggfortify)
library(ggbiplot)
##
## Attaching package: 'ggbiplot'
## The following object is masked from 'package:ggfortify':
##
## ggbiplot
wes.yel <- "#E1AF00"
wes.blu <- "#78B7C5"
plot1<- autoplot(PCA, data = z.scores, colour="SPP", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
#ggsave(plot1, filename = "pc1v2.png", bg="transparent", width = 180, height = 120, units = "mm", dpi = 300)
plot5<- autoplot(PCA, x=2, y=3, data = z.scores, colour='SPP', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot5
#ggsave(plot5, filename = "pc2v3.png", bg="transparent", width = 180, height = 120, units = "mm", dpi = 300)
If I want to color by method, I think I need to combine my individual PCAs.
#data frame with just BD1 and HD1 for dig
dig_p_long2 <- z.scores %>%
select(ID, SPP, ends_with(".dig"))
dig_p_long2 <- subset(dig_p_long2, select = -c(BD2.dig, HD2.dig))
#data frame with just BD2 and HD2 for dig
dig_p_long3 <- z.scores %>%
select(ID, SPP, ends_with(".dig"))
dig_p_long3 <- subset(dig_p_long3, select = -c(BD1.dig, HD1.dig))
PCA_dig <- prcomp(dig_p_long2[, 3:12])
#PC scores for digital (BD1/HD1)
PCA.scores.dig <- as.data.frame(PCA_dig$x)
PCA.scores.dig$method <- "dig"
PCA.scores.dig$ID <- dig_p_long$ID
PCA.scores.dig$SPP <- dig_p_long$SPP
PCA_dig2 <- prcomp(dig_p_long3[, 3:12])
#PC scores for digital (BD1/HD1)
PCA.scores.dig2 <- as.data.frame(PCA_dig2$x)
PCA.scores.dig2$method <- "dig"
PCA.scores.dig2$ID <- dig_p_long2$ID
PCA.scores.dig2$SPP <- dig_p_long2$SPP
#PC scores for manual
PCA.scores.man <- as.data.frame(PCA_man$x)
PCA.scores.man$method <- "man"
PCA.scores.man$ID <- man_p_long$ID
PCA.scores.man$SPP <- man_p_long$SPP
PCA.comb <- rbind(PCA.scores.dig, PCA.scores.man)
PCA.comb2 <- rbind(PCA.scores.dig2, PCA.scores.man)
ggplot(data = PCA.comb, aes(x = PC1, y = PC2, color = method)) +
geom_point(alpha = 0.8, size = 3) +
stat_ellipse(aes(group = method), level = 0.95) +
scale_color_manual(values=c("#E1AF00","#78B7C5"))+
scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
labs(
title = "PCA of Digital and Manual Measurements (BD/HD1)",
x = "Principal Component 1",
y = "Principal Component 2",
color = "Method"
) +
theme_minimal()
ggplot(data = PCA.comb, aes(x = PC2, y = PC3, color = method)) +
geom_point(alpha = 0.8, size = 3) +
stat_ellipse(aes(group = method), level = 0.95) +
scale_color_manual(values=c("#E1AF00","#78B7C5"))+
scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
labs(
title = "PCA of Digital and Manual Measurements (BD/HD1)",
x = "Principal Component 1",
y = "Principal Component 2",
color = "Method"
) +
theme_minimal()
ggplot(data = PCA.comb2, aes(x = PC1, y = PC2, color = method)) +
geom_point(alpha = 0.8, size = 3) +
stat_ellipse(aes(group = method), level = 0.95) +
scale_color_manual(values=c("#E1AF00","#78B7C5"))+
scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
labs(
title = "PCA of Digital and Manual Measurements (BD/HD2)",
x = "Principal Component 1",
y = "Principal Component 2",
color = "Method"
) +
theme_minimal()
ggplot(data = PCA.comb2, aes(x = PC2, y = PC3, color = method)) +
geom_point(alpha = 0.8, size = 3) +
stat_ellipse(aes(group = method), level = 0.95) +
scale_color_manual(values=c("#E1AF00","#78B7C5"))+
scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
labs(
title = "PCA of Digital and Manual Measurements (BD/HD2)",
x = "Principal Component 1",
y = "Principal Component 2",
color = "Method"
) +
theme_minimal()